Genomics Data Curation Roles, Skills, and Perception of Data Quality

نویسندگان

  • Hong Huang
  • Corinne Jörgensen
  • Besiki Stvilia
چکیده

Compared to a decade ago, genomics scientists, driven by technical changes and availability of massive genomic data, are performing a wider plurality of curation roles including those of end-users, curators, or dual-role users. Scientists with different curation roles (including that of end user) may focus on different data quality aspects and skills requirements in a community curation environment. This study examines how genomics scientists' perceived priorities for data quality and data quality skills differ when assuming different roles played in genomics data curation work. The analysis of survey data collected from 147 genomics scientists found that curators of genomic data valued quality criteria that can be assessed through direct examination of the data more highly, while end-users placed a high value on the quality criteria that can be assessed indirectly such as believability. With regard to data quality skills, curators appeared to care more about understanding user's requirements and specific data management skills than end-users, while end-users valued the skills needed to deal with information overload more highly – those needed to identify useful, relevant information from large amounts of data. The study found that scientists with different curation roles, given common curation tasks with the same skill requirements, prioritized different data quality criteria. The data quality, skill priorities, and tradeoffs identified by this study can inform the development of effective data curation mandates and policies, data quality assurance planning and training, and the design of curation role specific tool dashboards and visualization interfaces for genomics data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Domain knowledge and data quality perceptions in genome curation work

Purpose-This article aims at understanding genomics scientists' perceptions in data quality assurances based on their domain knowledge. Design/methodology/approach-The study used a survey method to collect responses from 149 genomics scientists grouped by domain knowledge. They ranked the top-five quality criteria based on hypothetical curation scenarios. The results were compared using Chi-Squ...

متن کامل

Prioritization of data quality dimensions and skills requirements in genome annotation work

The rapid accumulation of genome annotations, as well as their widespread reuse in clinical and scientific practice, poses new challenges to management of the quality of scientific data. This study contributes towards better understanding of scientist perception and priorities for data quality and data quality assurance skills needed in genome annotation. Our study was guided by a previously de...

متن کامل

Big Data to Knowledge—Harnessing Semiotic Relationships of Data Quality and Skills in Genome Curation Work

This article aims to understand the views of genomics scientists with regard to the data quality assurances associated with semiotics and Data-Information-Knowledge (DIK). The resulting communication of signs generated from genomic curation work, was found within different semantic levels of DIK that correlate specific data quality dimensions with their respective skills. Syntactic DQ dimension...

متن کامل

Study of the foundation, models and issues of research data curation and management in scientific and academic environments

Background and Aim: The purpose of this paper is to study, identifying and discuss the foundation and concepts, models and frameworks, dimensions and challenges of research data curation and management in scientific and academic environments. Method: This article is a review article and library method was used to collect scientific and research texts in this field. In this research, external an...

متن کامل

From manual curation to visualization of gene families and networks across Solanaceae plant species

High-quality manual annotation methods and practices need to be scaled to the increased rate of genomic data production. Curation based on gene families and gene networks is one approach that can significantly increase both curation efficiency and quality. The Sol Genomics Network (SGN; http://solgenomics.net) is a comparative genomics platform, with genetic, genomic and phenotypic information ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014